54 research outputs found

    A Compact Index for Cartesian Tree Matching

    Get PDF
    Cartesian tree matching is a recently introduced string matching problem in which two strings match if their corresponding Cartesian trees are the same. It is considered appropriate to find patterns regarding their shapes especially in numerical time series data. While many related problems have been addressed, developing a compact index has received relatively less attention. In this paper, we present a 3n+o(n)-bit index that can count the number of occurrences of a Cartesian tree pattern in ?(m) time where n and m are the text and pattern length. To the best of our knowledge, this work is the first ?(n)-bit compact data structure for indexing for this problem

    Simple Order-Isomorphic Matching Index with Expected Compact Space

    Get PDF
    In this paper, we present a novel indexing method for the order-isomorphic pattern matching problem (also known as order-preserving pattern matching, or consecutive permutation matching), in which two equal-length strings are defined to match when X[i] < X[j] iff Y[i] < Y[j] for 0 ? i,j < |X|. We observe an interesting relation between the order-isomorphic matching and the insertion process of a binary search tree, based on which we propose a data structure which not only has a concise structure comprised of only two wavelet trees but also provides a surprisingly simple searching algorithm. In the average case analysis, the proposed method requires ?(R(T)) bits, and it is capable of answering a count query in ?(R(P)) time, and reporting an occurrence in ?(lg |T|) time, where T and P are the text and the pattern string, respectively; for a string X, R(X) is the total time taken for the construction of the binary search tree by successively inserting the keys X[|X|-1],?,X[0] at the root, and its expected value is ?(|X|lg?) where ? is the alphabet size. Furthermore, the proposed method can be viewed as a generalization of some other methods including several heuristics and restricted versions described in previous studies in the literature

    Indexing Isodirectional Pointer Sequences

    Get PDF
    Many sequential and temporal data have dependency relationships among their elements, which can be represented as a sequence of pointers. In this paper, we introduce a new string matching problem with a particular type of strings, which we call isodirectional pointer sequence, in which each entry has a pointer to another entry. The proposed problem is not only a formalization of real-world dependency matching problems, but also a generalization of variants of the string matching problem such as parameterized pattern matching and Cartesian tree matching. We present a 2nlg?+2n+o(n)-bit index that preprocesses the text T[1:n] so as to count the number of occurrences of pattern P[1:m] in ?(mlg?) where ? is the number of distinct lengths of pointers in T. Our index is also easily implementable in practice because it consists of wavelet trees and range maximum query index, which are widely used building blocks in many other compact data structures. By compressing the wavelet trees, the index can also be stored into 2nH^*?(T)+2n+o(n) bits where H^*?(T) is the 0-th order empirical entropy of the distribution of pointer lengths of T

    A Scaffold Analysis Tool Using Mate-Pair Information in Genome Sequencing

    Get PDF
    We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures

    Photo Quality Assessment based on a Focusing Map to Consider Shallow Depth of Field

    Get PDF
    Proliferation and advances in digital cameras encourage people to take many photos. However, the number of photos that people can access is increasing exponentially. Good quality photo selection is becoming burdensome. In this paper, we propose a novel method to evaluate photo quality considering DoF (Depth of Field) based on a focusing map. The focusing map is a form of saliency map classified into four levels based on the spatial distribution of Canny edges. We implemented it in a CUDA environment to improve the speed of focusing map generation. In order to evaluate our method, we tested our feature on the four classified 206 photos; then, we compare our method to a photo set manually classified by a user. The proposed measure efficiently assesses the photos with DoF. Especially, the expert group who used DSLR camera agreed that our photo assessment measure is useful

    A new cognition-based chat system for avatar agents in virtual space

    Full text link
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee

    Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions

    Get PDF
    Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems

    Spanning Closed Trail and Hamiltonian Cycle in Grid Graphs

    No full text
    . In this paper we study a trail routing and a hamiltonian cycle in a class of grid graphs, polycube and polymino. A Spanning closed trail is an eulerian subgraph containing all vertices of a given graph. For general grid graphs we prove that the problem of finding that trail is NP-complete and for a wide subclass of grid graphs, called polymino, we give an optimal algorithm if it exists. For polycube graphs we prove that every polycube has a spanning closed trail. Finally we show that a graph product G to a simple path with length n, G \Theta Pn , is hamiltonian for all n 2, if G is a polymino with a perfect matching. 1 Introduction Spanning closed trail(for brevity trail) can be considered as one relaxation between two cycles, hamiltonian and eulerian cycle. While hamiltonian routing is to visit all vertices(at most once) with some edges(at most once) and eulerian routing is to visit all edges(at most once) with all vertices(at least once), this trail routing is to visit all vert..

    Complex deformable objects in virtual reality

    Full text link
    corecore